Product and content association

ABSTRACT

Methods and apparatus are disclosed regarding an e-commerce system that maintains references between products and relevant content. In some embodiments, methods and/or apparatus obtain content from one or more content providers via a computer network, identify a product from a product catalog of an electronic database that is related to the obtained content; and update references to relevant content maintained in an electronic database for the product to include a reference to the obtained content.

FIELD OF THE INVENTION

Various embodiments relate to electronic commerce (e-commerce), and moreparticularly, to providing information for products sold in ane-commerce environment.

BACKGROUND OF THE INVENTION

Electronic commerce (e-commerce) websites are an increasingly popularvenue for consumers to research and purchase products without physicallyvisiting a conventional brick-and-mortar retail store. An e-commercewebsite may provide a vast array of products and/or services whichcustomers may purchase from the website. In order to aid the customer inmaking informed purchase decisions, the e-commerce website may maintainand present to its customers various types of information about eachoffered product and/or service such as, for example, technicalspecifications, pictures, video demonstrations, customer reviews, etc.

A vast amount of information for any given product or service may begenerally found on the Internet. In particular, various websitesregularly feature in-depth product reviews, product commentaries,product comparisons, purchasing advice for product categories, productdemonstrations, etc. that may aid a customer in making a purchasingdecision. However, many customers may not have the time, desire, and/orability to find the most relevant information for products of interest.Accordingly, an e-commerce website, that is able to readily provide suchinformation, may provide a service that may both drive sales as well asincrease customer loyalty.

Limitations and disadvantages of conventional and traditional approachesshould become apparent to one of skill in the art, through comparison ofsuch systems with aspects of the present invention as set forth in theremainder of the present application.

BRIEF SUMMARY OF THE INVENTION

Apparatus and methods of associating products with relevant content areshown in and/or described in connection with at least one of thefigures, and are set forth more completely in the claims.

These and other advantages, aspects and novel features of the presentinvention, as well as details of an illustrated embodiment thereof, willbe more fully understood from the following description and drawings.

BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 shows an e-commerce environment comprising a computing device andan e-commerce system in accordance with an embodiment of the presentinvention.

FIG. 2 shows an embodiment of a computing device for use in thee-commerce environment of FIG. 1.

FIG. 3 shows user profiles and a product catalog maintained by ane-commerce system of FIG. 1.

FIG. 4 shows an embodiment of a product listing provided by thee-commerce system of FIG. 1.

FIG. 5 shows a flowchart for an embodiment of an example process thatmay be used by the e-commerce system of FIG. 1 to associate content witha product.

FIG. 6 shows a flowchart for an embodiment of another example processthat may be used by the e-commerce system of FIG. 1 to associate contentwith a product.

DETAILED DESCRIPTION OF THE INVENTION

Aspects of the present invention are related to associating relevantcontent to products offered by an e-commerce site. More specifically,certain embodiments of the present invention relate to apparatus,hardware and/or software systems, and associated methods that analyzecontent from a plurality of content providers and associate products ofan e-commerce site with relevant content based on such analysis.

Referring now to FIG. 1, an e-commerce environment 10 is depicted. Asshown, the e-commerce environment 10 may include a computing device 20connected to an e-commerce system 30 via a computer network 40. Thenetwork 40 may include a number of private and/or public networks suchas, for example, wireless and/or wired LAN networks, cellular networks,and the Internet that collectively provide a communication path and/orpaths between the computing device 20 and the e-commerce system 30. Thecomputing device 20 may include a desktop, a laptop, a tablet, a smartphone, and/or some other type of computing device which enables a userto communicate with the e-commerce system 30 via the network 40. Thee-commerce system 30 may include one or more web servers, databaseservers, routers, load balancers, and/or other computing and/ornetworking devices that operate to provide an e-commerce experience forusers that connect to the e-commerce system 30 via the computing device20 and the network 40.

The e-commerce system 30 may further include a content aggregator 33 andone or more electronic databases 37 configured to store data used by thecontent aggregator 33 such as product catalog 300, product associations320, and customer profiles 330. The content aggregator 33 may includeone or more firmware and/or software instructions, routines, modules,etc. that the e-commerce system 30 may execute in order to extractcontent from one or more content providers and associate the extractedcontent with appropriate products and/or services provided by thee-commerce system 30. Further details regarding the content aggregator33 are presented below with respect to FIGS. 5 and 6.

FIG. 1 depicts a simplified embodiment of the e-commerce environment 10which may be implemented in numerous different manners using a widerange of different computing devices, platforms, networks, etc.Moreover, while aspects of the e-commerce environment 10 may beimplemented using a client/server architecture, aspects of thee-commerce may be implemented using a peer-to-peer architecture oranother networking architecture.

As noted above, the e-commerce system 30 may include one or morecomputing devices. FIG. 2 depicts an embodiment of a computing device 50suitable for the computing device 20 and/or the e-commerce system 30. Asshown, the computing device 50 may include a processor 51, a memory 53,a mass storage device 55, a network interface 57, and variousinput/output (I/O) devices 59. The processor 51 may be configured toexecute instructions, manipulate data and generally control operation ofother components of the computing device 50 as a result of itsexecution. To this end, the processor 51 may include a general purposeprocessor such as an x86 processor or an ARM processor which areavailable from various vendors. However, the processor 51 may also beimplemented using an application specific processor and/or other logiccircuitry.

The memory 53 may store instructions and/or data to be executed and/orotherwise accessed by the processor 51. In some embodiments, the memory53 may be completely and/or partially integrated with the processor 51.

In general, the mass storage device 55 may store software and/orfirmware instructions which may be loaded in memory 53 and executed byprocessor 51. The mass storage device 55 may further store various typesof data which the processor 51 may access, modify, and/otherwisemanipulate in response to executing instructions from memory 53. To thisend, the mass storage device 55 may comprise one or more redundant arrayof independent disks (RAID) devices, traditional hard disk drives (HDD),solid-state device (SSD) drives, flash memory devices, read only memory(ROM) devices, etc.

The network interface 57 may enable the computing device 50 tocommunicate with other computing devices directly and/or via network 40.In particular, the network interface 57 may permit the processor 51 toobtain content from content providers via network 40. To this end, thenetworking interface 57 may include a wired networking interface such asan Ethernet (IEEE 802.3) interface, a wireless networking interface suchas a WiFi (IEEE 802.11) interface, a radio or mobile interface such as acellular interface (GSM, CDMA, LTE, etc), and/or some other type ofnetworking interface capable of providing a communications link betweenthe computing device 50 and network 40 and/or another computing device.

Finally, the I/O devices 59 may generally provide devices which enable auser to interact with the computing device 50 by either receivinginformation from the computing device 50 and/or providing information tothe computing device 50. For example, the I/O devices 59 may includedisplay screens, keyboards, mice, touch screens, microphones, audiospeakers, etc.

While the above provides general aspects of a computing device 50, thoseskilled in the art readily appreciate that there may be significantvariation in actual implementations of a computing device. For example,a smart phone implementation of a computing device may use vastlydifferent components and may have a vastly different architecture than adatabase server implementation of a computing device. However, despitesuch differences, computing devices generally include processors thatexecute software and/or firmware instructions in order to implementvarious functionality. As such, aspects of the present application mayfind utility across a vast array of different computing devices and theintention is not to limit the scope of the present application to aspecific computing device and/or computing platform beyond any suchlimits that may be found in the appended claims.

As part of the provided e-commerce experience, the e-commerce system 30may enable customers, which may be guests or members of the e-commercesystem 30, to browse and/or otherwise locate products. The e-commercesystem 30 may further enable such customers to purchase products and/orservices offered for sale. To this end, the e-commerce system 30 maymaintain an electronic database or catalog 300 which may be stored on anassociated mass storage device 55. As shown in FIG. 3, the catalog 300may include listings 310 for each product and/or service available forpurchase. Each listing 310 may include various information or attributesregarding the respective product and/or service, such as a uniqueproduct identifier (e.g., stock-keeping unit “SKU”), a productdescription, product image(s), manufacture information, availablequantity, price, product features, etc. Moreover, while the e-commercesystem 30 may enable guests to purchase products and/or services withoutregistering and/or otherwise signing-up for a membership, the e-commercesystem 30 may provide additional and/or enhanced functionality to thoseusers that become a member.

To this end, the e-commerce system 30 may enable members to create acustomer profile 330. As shown, a customer profile 330 may includepersonal information 331, purchase history data 335, and other customeractivity data 337. The personal information 331 may include such itemsas name, mailing address, email address, phone number, billinginformation, clothing sizes, birthdates of friends and family, etc. Thepurchase history data 335 may include information regarding productspreviously purchased by the customer from the e-commerce system 30. Thecustomer history data 335 may further include products previouslypurchased from affiliated online and brick-and-mortar vendors.

The other customer activity data 337 may include information regardingprior customer activities such as products for which the customer haspreviously searched, products for which the customer has previouslyviewed, products for which the customer has provide comments, productsfor which the customer has rated, products for which the customer haswritten reviews, etc. and/or purchased from the e-commerce system 30.The other customer activity data 337 may further include similaractivities associated with affiliated online and brick-and-mortarvendors.

As part of the e-commerce experience, the e-commerce system 30 may causea computing device 10 to display a product listing 310 as shown in FIG.4. In particular, the e-commerce system 30 may provide such a productlisting 310 in response to a member browsing products by type, price,kind, etc., viewing a list of products obtained from a product search,and/or other techniques supported by the e-commerce system 30 forlocating products of interest. As shown, the product listing 310 mayinclude one or more representative images 350 of the product as well asa product description 360. The product listing 310 may further includeone or more hyperlinks and/or other references 370 to additionalinformation associated with the product and/or service. In particular,the content aggregator 33 may analyze content provided by many differentcontent providers such as websites, blogs, etc., identify which contentis relevant to a particular product, and associate the relevant contentto the product.

Referring now to FIG. 5, an example method 500 is shown that may be usedby the content aggregator 33 to analyze content and associate suchcontent with products. At 510, the content aggregator 33 may obtain orotherwise collect content from various content providers on theInternet. In particular, the content aggregator 33 may subscribe tovarious RSS (Rich Site Summary) or (Really Simply Syndication) feeds inorder to receive RSS documents from such RSS feeds. A content providersuch as a website may provide RSS feeds to publish RSS documents forfrequently updated information of the website such as, for example, blogentries, news headlines, audio, and video. The RSS documents received bythe content aggregator 33 may include full text or summarized text ofthe updated content and may further include metadata for the updatedcontent such as publishing date and author's name. Thus, by subscribingto various RSS feeds, the content aggregator 33 may automaticallyreceive RSS documents from publishers without requiring the contentaggregator 33 to poll or otherwise periodically check the content of thecorresponding content provider.

Besides RSS feeds, the content aggregator 33 may obtain further contentby polling websites of interests for relevant content. To this end, thecontent aggregator 33 may maintain a list of websites to periodicallycheck for new content. The content aggregator 33 may then crawl ortraverse such websites for content in a manner similar to webcrawlersused by web search engines.

At 515, the content aggregator 33 may assign categories to contentobtained at 510. For example, the content aggregator 33 may assign acategory or categories to each received RSS document based on its URL(Universal Resource Locator), title of the content, main text of thecontent, etc. In particular, the content aggregator 33 may maintain alist of categories for the products of the product catalog 300 andcategorize such RSS documents accordingly.

The content aggregator 33 at 520 may analyze the content to extractrelevant phrases. For example, the content aggregator 33 may extract themain text of the obtained content using various classificationalgorithms, shallow text processing, metadata parsing, etc. The contentaggregator 33 may further use the Stanford Named Entity Recognizer(SNER), the OpenNLP library, and/or other natural language processingtechniques to extract relevant phrases from the obtained content. Inparticular, the content aggregator 33 may use SNER to label sequences ofwords in the content which are the names of things, such as person,organizations, company names, and/or locations. The content aggregator33 may further use the OpenNLP natural language processor to performtokenization, sentence segmentation, part-of-speech tagging, namedentity extraction, chunking, and parsing of the obtained content. Inparticular, the content aggregator 33 at 520 via such tools may extracttrademark product names from the content in order to better ascertain towhich products of the product catalog 300 that the obtained contentrelates.

The content aggregator 33 at 520 may further look for entities notextracted by the SNER or OpenNLP tools. To this end, the contentaggregator 33 may maintain a list of names, phrases, etc. to matchagainst the obtained content in order to determine whether such contentincludes such names, phrases, etc.

Conversely, the content aggregator 33 at 530 may remove blacklistedphrases from phrases obtained at 520. To this end, the contentaggregator 33 may maintain a list of names, phrases, etc. and removesuch names, phrases, etc. from the phrases extracted at 520. In thismanner, a technician or other employee may tweak and fine tune theresults of the phrase extraction by removing phrases that routineprovide false associations between content and products.

At 540, the content aggregator 33 may rank the remaining phrases basedon a weighted term frequency. In particular, the content aggregator 33may rank the remaining phrases not only upon the frequency of suchphrases occur in the content but also on the position of such phrases inthe content. For example, the content aggregator 33 may give terms usedin the title of the content the greatest weight, terms used in the firstparagraph the next greatest weight, etc. The content aggregator 33 mayfurther affect the weight of a term based on how often the term appearedin other documents.

The content aggregator 33 at 550 may select phrases with a score greaterthan a threshold level. To this end, the content aggregator 33 may sortthe phrases based on their weighted term frequency scores. The contentaggregator 33 may then select all such phrases greater than a specifiedminimum threshold score or may select the top specified percentage (e.g.the top 20%) of phrases in the sorted list.

After 550, the content aggregator 33 now has a list of phrases which arelikely the most relevant phrases for the content. The content aggregator33 then at 560 searches through the product catalog 300 to identifyproducts which match the selected phrases. Using the metadata of thearticle and products (ex. Category), the content aggregator 33 mayremove irrelevant products. Upon finding a match, the content aggregator33 at 570 may update the product associations 320 of the product catalog300 to include a reference (e.g., a hyperlink with descriptive linktext) to the content. In this manner the content aggregator 33 mayautomatically collect lists of references 320 to relevant content forits products in the product catalog 300.

Referring now to FIG. 6, another example method 600 is shown that may beused by the content aggregator 33 to analyze content and associate suchcontent with products. At 610, the content aggregator 33 may obtain orotherwise collect content from various content providers on theInternet. In particular, the content aggregator 33 may obtain suchcontent via RSS feeds, polling, and/or crawling in a manner similar tothat described above in regard to step 510 of process 500.

At 620, the content aggregator 33 may extract the main text of theobtained content using various classification algorithms, shallow textprocessing, metadata parsing, etc. The content aggregator 33 then at 630may analyze the content to extract the context of the content. Forexample, the content aggregator 33 may extract the context of thecontent using a natural language processing technique such as, forexample, Latent Dirichlet Allocation (LDA) using a set of topics orcategories such as, for example, Wikipedia tags. As a result of suchprocessing, the content aggregator may express the context of eachobtained document as a sparse probability distribution over the set oftopics.

At 640, the content aggregator 33 may extract the context of eachproduct in the product catalog 300. In particular, the contentaggregator 33 may extract such context in a manner similar to that usedat 630 to extract the context of the content. For example, the contentaggregator 33 may use LDA natural language and Wikipedia tags to obtainfor each product a sparse probability distribution of its productlisting 310 over the Wikipedia tags.

Using the extracted contexts, the content aggregator 33 at 650 maygenerate distance measures between the probability distributions of thecontent and each product of the catalog 300. The content aggregator 33may use various distance measures such as Euclidean distance, Chebschevdistance, Jaccard's distance, etc. to obtain such distance measures.

Based on such distance measures, the content aggregator 33 at 660 maydetermine to which products that the content is most related. Inparticular, the content aggregator 33 may select the product with thesmallest distance, the products with the smallest distances, and/or theproducts with a distance smaller than a threshold distance. The contentaggregator 33 may also sort the products based on their distancemeasures, and select a predefined percentage of the products having thesmallest distance measures.

The content aggregator 33 then at 670 may update the productassociations 320 for the selected products to include a reference (e.g.,a hyperlink with descriptive link text) to the content. In this mannerthe content aggregator 33 may automatically collect lists of references320 to relevant content for its products in the product catalog 300.

Various embodiments of the invention have been described herein by wayof example and not by way of limitation in the accompanying figures. Forclarity of illustration, exemplary elements illustrated in the figuresmay not necessarily be drawn to scale. In this regard, for example, thedimensions of some of the elements may be exaggerated relative to otherelements to provide clarity. Furthermore, where considered appropriate,reference labels have been repeated among the figures to indicatecorresponding or analogous elements.

Moreover, certain embodiments may be implemented as a plurality ofinstructions on a non-transitory, computer readable storage medium suchas, for example, flash memory devices, hard disk devices, compact discmedia, DVD media, EEPROMs, etc. Such instructions, when executed by oneor more computing devices, may result in the one or more computingdevices identifying relevant content for a particular product or serviceand associating the relevant content with the product or service.

While the present invention has been described with reference to certainembodiments, it will be understood by those skilled in the art thatvarious changes may be made and equivalents may be substituted withoutdeparting from the scope of the present invention. In addition, manymodifications may be made to adapt a particular situation or material tothe teachings of the present invention without departing from its scope.For example, while the above processes was described primarily from thestandpoint of associating products with relevant textual content,similar processes may also be used to associate products withnon-textual content (e.g., pictures, videos, audio, etc.) using similaranalytical techniques to analyze metadata associated with thenon-textual content and/or to analyze the non-textual content itself todetermine its contextual relevance. Therefore, it is intended that thepresent invention not be limited to the particular embodiment orembodiments disclosed, but that the present invention encompasses allembodiments falling within the scope of the appended claims.

What is claimed is:
 1. A computer-implemented method, comprising:obtaining content from one or more content providers via a computernetwork; identifying a product from a product catalog of an electronicdatabase that is related to the obtained content; and updating theelectronic database to include for the product a reference to theobtained content.
 2. The computer-implemented method of claim 1, furthercomprising presenting a customer with a product listing for the productthat comprises the reference to the obtained content.
 3. Thecomputer-implemented method of claim 1, wherein said identifyingcomprises: extracting relevant phrases from the content; ranking thephrases based on weighted term frequency; selecting phrases based ontheir weighted term frequency; and selecting the product from theproduct catalog based on the selected phrases.
 4. Thecomputer-implemented method of claim 3, further comprising removingblacklisted phrases from the extracted phrases prior to said ranking. 5.The computer-implemented method of claim 1, wherein said identifyingcomprises: extracting context from the content based on natural languageprocessing and a set of topics to obtain a distribution for the contentacross the set of topics; extracting context for the product based onapplying natural language processing and a set of topics to its productlisting to obtain a distribution for the product listing across the setof topics; obtaining a distance measure between the distribution for thecontent and the distribution for the product listing; and selecting theproduct based on the distance measure.
 6. The computer-implementedmethod of claim 1, wherein the natural language processing uses LatentDirichlet Allocation to obtain the distribution for the content and thedistribution for the product.
 7. A non-transitory computer-readablemedium, comprising a plurality of instructions, that in response tobeing executed, result in a computing device: obtaining content from oneor more content providers via a computer network; identifying a productfrom a product catalog of an electronic database that is related to theobtained content; and updating the electronic database to include forthe product a reference to the obtained content.
 8. The non-transitorycomputer-readable medium of claim 7, further comprising instructionsthat result in the computing device presenting a customer with a productlisting for the product that comprises the reference to the obtainedcontent.
 9. The non-transitory computer-readable medium of claim 7,further comprising instructions that result in the computing device:extracting relevant phrases from the content; ranking the phrases basedon weighted term frequency; selecting phrases based on their weightedterm frequency; and selecting the product from the product catalog basedon the selected phrases.
 10. The non-transitory computer-readable mediumof claim 9, further comprising instructions that result in the computingdevice removing blacklisted phrases from the extracted phrases prior toranking the phrases.
 11. The non-transitory computer-readable medium ofclaim 7, further comprising instructions that result in the computingdevice: extracting context from the content based on natural languageprocessing and a set of topics to obtain a distribution for the contentacross the set of topics; extracting context for the product based onapplying natural language processing and a set of topics to its productlisting to obtain a distribution for the product listing across the setof topics; obtaining a distance measure between the distribution for thecontent and the distribution for the product listing; and selecting theproduct based on the distance measure.
 12. The non-transitorycomputer-readable medium of claim 11, further comprising instructionsthat result in the computing device performing the natural languageprocessing in accordance with Latent Dirichlet Allocation to obtain thedistribution for the content and the distribution for the product.
 13. Acomputing device, comprising a network interface to a computer network;an electronic database comprising a product catalog having a pluralityof product listings for a plurality of products; and a processorconfigured to: obtain content from one or more content providers via thenetwork interface; identify a product from the product catalog that isrelated to the obtained content; and update the electronic database toinclude for the identified product a reference to the obtained content.14. The computing device of claim 13, wherein the processor is furtherconfigured to present, via the network interface, a product listing forthe product that comprises the reference to the obtained content. 15.The computing device of claim 13, wherein the processor is furtherconfigured to: extract relevant phrases from the content; rank thephrases based on weighted term frequency; select phrases based on theirweighted term frequency; and select the product from the product catalogbased on the selected phrases.
 16. The computing device of claim 15,wherein the processor is further configured to remove blacklistedphrases from the extracted phrases prior to ranking the phrases.
 17. Thecomputing device of claim 13, wherein the processor is furtherconfigured to: extract context from the content based on naturallanguage processing and a set of topics to obtain a distribution for thecontent across the set of topics; extract context for the product basedon applying natural language processing and a set of topics to itsproduct listing to obtain a distribution for the product listing acrossthe set of topics; obtain a distance measure between the distributionfor the content and the distribution for the product listing; and selectthe product based on the distance measure.
 18. The computing device ofclaim 17, wherein the processor is further configured to performing thenatural language processing in accordance with Latent DirichletAllocation to obtain the distribution for the content and thedistribution for the product.